A systematic comparison and evaluation of biclustering methods for gene expression data
نویسندگان
چکیده
MOTIVATION In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness has been demonstrated for different organisms and datasets. Several biclustering methods have been proposed in the literature; however, it is not clear how the different techniques compare with each other with respect to the biological relevance of the clusters as well as with other characteristics such as robustness and sensitivity to noise. Accordingly, no guidelines concerning the choice of the biclustering method are currently available. RESULTS First, this paper provides a methodology for comparing and validating biclustering methods that includes a simple binary reference model. Although this model captures the essential features of most biclustering approaches, it is still simple enough to exactly determine all optimal groupings; to this end, we propose a fast divide-and-conquer algorithm (Bimax). Second, we evaluate the performance of five salient biclustering algorithms together with the reference model and a hierarchical clustering method on various synthetic and real datasets for Saccharomyces cerevisiae and Arabidopsis thaliana. The comparison reveals that (1) biclustering in general has advantages over a conventional hierarchical clustering approach, (2) there are considerable performance differences between the tested methods and (3) already the simple reference model delivers relevant patterns within all considered settings.
منابع مشابه
Comparison of Biclustering Methods: A Systematic Comparison and Evaluation of Biclustering Methods for Gene Expression Data
Motivation: In recent years, there have been various efforts to overcome the limitations of standard clustering approaches for the analysis of gene expression data by grouping genes and samples simultaneously. The underlying concept, which is often referred to as biclustering, allows to identify sets of genes sharing compatible expression patterns across subsets of samples, and its usefulness h...
متن کاملبه کارگیری خوشهبندی دوبعدی با روش «زیرماتریسهای با میانگین- درایههای بزرگ» در دادههای بیان ژنی حاصل از ریزآرایههای DNA
Background and Objective: In recent years, DNA microarray technology has become a central tool in genomic research. Using this technology, which made it possible to simultaneously analyze expression levels for thousands of genes under different conditions, massive amounts of information will be obtained. While traditional clustering methods, such as hierarchical and K-means clustering have been...
متن کاملBiFree: An Efficient Biclustering Technique for Gene Expression Data Using Two Layer Free Weighted Bipartite Graph Crossing Minimization
Conventional clustering technique for gene expression data provides a global view of the data. In the biological prospective, a local view is essential for better analysis of gene expression data with simultaneous grouping of genes and conditions. Several biclustering techniques have been proposed in the literature based on different problem formulation. Therefore, it is difficult to compare th...
متن کاملImproved biclustering of microarray data demonstrated through systematic performance tests
A new algorithm is presented for 4tting the plaid model, a biclustering method developed for clustering gene expression data. The approach is based on speedy individual di6erences clustering and uses binary least squares to update the cluster membership parameters, making use of the binary constraints on these parameters and simplifying the other parameter updates. The performance of both algor...
متن کاملBiRange:An Efficient Framework for Biclustering of Gene Expression Data Using Range Bipartite Graph
Biclustering is a vital data mining tool which is commonly employed on microarray data sets for analysis task in bioinformat ics research and medical applications. There has been extensive research on biclustering of gene expression data arising from microarray experiment. This technique is an important analysis tool in gene expression measurement, when some genes have multip le functions and e...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Bioinformatics
دوره 22 9 شماره
صفحات -
تاریخ انتشار 2006